{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example-5-MC-GPA-IRS-CVA\n",
    "# Author: Matthew Dixon\n",
    "# Version: 1.0 (28.4.2020)\n",
    "# License: MIT\n",
    "# Email: matthew.dixon@iit.edu\n",
    "# Notes: tested on Mac OS X running Python 3.6.9 with the following packages:\n",
    "# scikit-learn=0.22.1, numpy=1.18.1, matplotlib=3.1.3\n",
    "# Citation: Please cite the following reference if this notebook is used for research purposes:\n",
    "# Dixon M.F., Halperin I. and P. Bilokon, Machine Learning in Finance: From Theory to Practice, Springer Graduate # Textbook Series, 2020."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "# Overview\n",
    "This example demonstrates the application of Gaussian processes to CVA modeling on a counterparty portfolio of IRS contracts with 11 currencies and 10 FX processes. The notebook simulates the GP mark-to-market cube of the portfolio, and compares the Expected Positive Exposure when using a GP derivative pricing model versus a BS pricing model and calculate the CVA${}_0$.\n",
    "\n",
    "## Data preparation\n",
    "In the data, the interest rates appear first and then the FX processes, so that if $0 \\leq j \\leq n_r-1$, where $n_r$ is the number of rates, then the $j^{~th}$ process is the $j^{~th}$ interest rate and, ignoring $j=0$, the $(n_r+j-1)^{th}$ process is the FX process whose foreign currency is the $j^{~th}$ one.\n",
    "\n",
    "For example, in the case of 3 currencies (and therefore 2 FX rates) we would have a spatial arrangement like this:\n",
    "\n",
    "|  rate 0  |  rate 1  |  rate 2  |  FX 0  |  FX 1  |\n",
    "\n",
    "\n",
    "In the rates cube, $\\left[i, j, k\\right]$ refers to the $i^{~th}$ coarse time step, $j^{~th}$ interest rate process if $j\\leq n_r$, or the $({j-n_r})^{~th}$ FX process otherwise, and the $k^{~th}$ Monte Carlo scenario.\n",
    "\n",
    "## Data Download\n",
    "\n",
    "Please download the zipped .npz files from the following link, unzip them, and put them directly into the data folder `ML_Finance_Codes/data/`\n",
    "\n",
    "https://www.dropbox.com/sh/2dd1aida38nanwl/AACaCt1fXX2xr6I5T4B9U3_1a?dl=0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "from sklearn import gaussian_process\n",
    "from sklearn.gaussian_process.kernels import ConstantKernel, RBF, Matern\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "mtm_irs = np.load('../../data/mtm_irs.npz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "rates_and_fx = np.load('../../data/rates_and_fx.npz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "irs_params = np.load('../../data/irs_params.npz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "rate_params = np.load('../../data/rate_params.npz')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Aggregate underlying data together"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The aggregation of the data is as follows:\n",
    "- Select counterparty portfolio (1 of 5)\n",
    "- Select a IRS contract `i`\n",
    "- Select times t = ${t_j}$\n",
    "- Get M simulated prices from `mtm_irs[t, i, M]`\n",
    "\n",
    "- Get currency from `irs_params['arr_0'][i]`\n",
    "- Get foreign rate and FX if currency $\\neq$ 0, else get domestic rate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cpty_idx = 0\n",
    "port_idx = []\n",
    "for i in range(len(irs_params['arr_0'])):\n",
    "    if (irs_params['arr_0'][i][5] == cpty_idx):\n",
    "        port_idx.append(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "port_idx = np.array(port_idx)\n",
    "T = np.shape(mtm_irs['arr_0'])[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.shape(mtm_irs['arr_0'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reset dates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reset_idx = (np.arange(101) - 1) // 5*5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M = 2000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Construct IRS mappings to rates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "irs_map = {}\n",
    "for i in port_idx:\n",
    "    ccy = irs_params['arr_0'][i][6]\n",
    "    if ccy == 0:\n",
    "        irs_map[i] = [0] # domestic rate\n",
    "    else:\n",
    "        irs_map[i] = [ccy, ccy+10] # foreign rate + FX rate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mtm = []\n",
    "rates_fx = []\n",
    "prev_rates_fx = []\n",
    "for t in np.arange(5, T, 10):     \n",
    "    mtm.append(mtm_irs['arr_0'][t, port_idx,:2*M])\n",
    "    \n",
    "    prev_rates_fx.append(rates_and_fx['arr_0'][t-5, :, :2*M].T) # previous reset date\n",
    "    \n",
    "    rates_fx.append(rates_and_fx['arr_0'][t, :, :2*M].T)\n",
    "\n",
    "nt = len(np.arange(1, T, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function can be used to sample the input space with the option method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def stratified_sampling(ar, M):\n",
    "    quantiles = [0, 0.25, 0.5, 0.75, 1]\n",
    "    num_buckets = len(quantiles) - 1\n",
    "    q = np.quantile(ar, quantiles, axis=0)\n",
    "    \n",
    "    idx = np.array([], dtype=int)\n",
    "    sum_ = 0\n",
    "    dim = np.shape(ar)[1]\n",
    "    num_samples = int(M / (num_buckets**dim))\n",
    "\n",
    "    if dim == 2: # domestic\n",
    "        for i in range(num_buckets):\n",
    "            for j in range(num_buckets):\n",
    "                idx_= np.where((ar[:,0] >= q[i,0]) & (ar[:,0] <= q[i+1,0]) \n",
    "                             & (ar[:,1] >= q[j,1]) & (ar[:,1] <= q[j+1,1]))\n",
    "                \n",
    "                if len(idx_[0]) >  num_samples:\n",
    "                    idx__ = idx_[0][:num_samples]\n",
    "                else: # oversample\n",
    "                    idx__ = np.random.choice(idx_[0], num_samples) # sample with replacement \n",
    "     \n",
    "                sum_+= len(idx__) \n",
    "                idx = np.append(idx, idx__)\n",
    "    else: \n",
    "        counter = 0\n",
    "        for i in range(num_buckets):\n",
    "            for j in range(num_buckets):\n",
    "                for k in range(num_buckets):\n",
    "                    idx_= np.where((ar[:, 0] >= q[i, 0]) & (ar[:, 0] <= q[i+1, 0]) \n",
    "                                 & (ar[:, 1] >= q[j, 1]) & (ar[:, 1] <= q[j+1, 1]) \n",
    "                                 & (ar[:, 2] >= q[k, 2]) & (ar[:, 2] <= q[k+1, 2]))\n",
    "                    if len(idx_[0]) > num_samples:\n",
    "                        idx__ = idx_[0][:num_samples]\n",
    "                        idx = np.append(idx, idx__)\n",
    "                        sum_ += len(idx__)\n",
    "                    elif len(idx_[0]) > 0: # oversample\n",
    "                        idx__ = np.random.choice(idx_[0], num_samples) # sample with replacement \n",
    "                        \n",
    "                        idx = np.append(idx,idx__)\n",
    "                        sum_ += len(idx__)\n",
    "                    else:\n",
    "                        counter += 1 \n",
    "                        print(i, j, k, 0, counter)\n",
    "    \n",
    "    # ensure that minimums and maximums are in the training set\n",
    "    idx_min = np.argmin(ar, axis=0)\n",
    "    idx_max = np.argmax(ar, axis=0)\n",
    "    \n",
    "    idx[0:dim] = idx_min\n",
    "    idx[dim:2*dim] = idx_max\n",
    "    return idx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GP kernel specification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sk_kernel_matern = Matern(length_scale=1.0, nu=0.5, length_scale_bounds=(0.01, 10000))\n",
    "sk_kernel_rbf = RBF(length_scale=1.0, length_scale_bounds=(0.01, 10000))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "portfolio = {}\n",
    "for idx in port_idx:\n",
    "    portfolio[idx] = {'GPs':np.array([0]*nt, dtype=object), 'weight':1, \n",
    "                      'min':np.array([0]*nt, dtype=object), \n",
    "                      'max':np.array([0]*nt, dtype=object), \n",
    "                      'min_sc_x':np.array([0]*nt, dtype=object),\n",
    "                      'max_sc_x':np.array([0]*nt, dtype=object),\n",
    "                      'min_sc_y':np.array([0]*nt, dtype=object), \n",
    "                      'max_sc_y':np.array([0]*nt, dtype=object)\n",
    "                     }\n",
    "\n",
    "gps = []\n",
    "for i in range(nt):\n",
    "    for j in range(20):\n",
    "        n_rates = len(irs_map[port_idx[j]])\n",
    "        if i > 0:\n",
    "            rates_ = np.zeros([2*M, n_rates+1])  \n",
    "            rates_[:, 0] = prev_rates_fx[i][:, irs_map[port_idx[j]][0]]\n",
    "            rates_[:, 1:] = rates_fx[i][:, irs_map[port_idx[j]]] \n",
    "        else:\n",
    "            rates_ = np.zeros([2*M,n_rates])  \n",
    "            rates_ = rates_fx[i][:, irs_map[port_idx[j]]] \n",
    "        idx = stratified_sampling(rates_, M)\n",
    "        print(len(idx))  \n",
    "        gp = gaussian_process.GaussianProcessRegressor(kernel=sk_kernel_matern, n_restarts_optimizer=20)\n",
    "        \n",
    "        # Rescale x and y to the unit interval\n",
    "        rates_sc_ = (rates_ - np.min(rates_, axis=0)) / (np.max(rates_, axis=0) - np.min(rates_, axis=0) + 1e-16)\n",
    "        irs_prices = mtm[i][j, :]\n",
    "          \n",
    "        if np.max(irs_prices) - np.min(irs_prices) != 0:  \n",
    "            irs_prices_sc_ = (irs_prices - np.min(irs_prices)) / (np.max(irs_prices) - np.min(irs_prices))\n",
    "        else:\n",
    "            irs_prices_sc_ = irs_prices - np.min(irs_prices)\n",
    "        \n",
    "        portfolio[port_idx[j]]['GPs'][i] = gp.fit(rates_sc_[idx, :], irs_prices_sc_[idx]) \n",
    "        portfolio[port_idx[j]]['min'][i] = np.min(rates_[idx, :], axis=0)\n",
    "        portfolio[port_idx[j]]['max'][i] = np.max(rates_[idx, :], axis=0)\n",
    "        portfolio[port_idx[j]]['min_sc_x'][i] = np.min(rates_, axis=0)\n",
    "        portfolio[port_idx[j]]['max_sc_x'][i] = np.max(rates_, axis=0)\n",
    "        portfolio[port_idx[j]]['min_sc_y'][i] = np.min(irs_prices)\n",
    "        portfolio[port_idx[j]]['max_sc_y'][i] = np.max(irs_prices)\n",
    "          \n",
    "        print(i, j)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CVA simulation using the credit default model described in the above reference.\n",
    "def CVA_simulation(sim_params, model_params, def_model):\n",
    "    \n",
    "    M        = sim_params['M']        # number of paths\n",
    "    nt       = sim_params['nt']       # number of exposure dates \n",
    "    r        = model_params['r']\n",
    "    T        = model_params['T'] \n",
    "    t0       = model_params['t0'] \n",
    "    haz_rate = model_params['lambda']\n",
    "    n_instruments = model_params['size']\n",
    "    \n",
    "    pi = {}\n",
    "    pi['tilde'] = np.array([0.0]*nt*M, dtype='float32').reshape(nt, M)     # GP portfolio value\n",
    "    pi['exact'] = np.array([0.0]*nt*M, dtype='float32').reshape(nt, M)     # BS portfolio value\n",
    "    pi['tilde_var'] = np.array([0.0]*nt*M, dtype='float32').reshape(nt, M) # GP portfolio variance\n",
    "    dPD = np.array([0.0]*nt*M, dtype='float32').reshape(nt, M)             # default probabilities\n",
    "    dt = T/nt \n",
    "    \n",
    "    for i in range(nt): # len(mtm) \n",
    "      \n",
    "        pred_ = 0\n",
    "        var_ = 0\n",
    "        obs_ = 0 \n",
    "\n",
    "        for j in range(n_instruments): # loop over instruments\n",
    "            idx = np.random.choice(range(2*M), M)\n",
    "            irs_prices = mtm[i][j, idx] # test samples\n",
    "            rates = rates_fx[i][:, irs_map[port_idx[j]]]  # test samples\n",
    "            prev_rates = prev_rates_fx[i][:, irs_map[port_idx[j]][0]]  # test samples\n",
    "               \n",
    "            n_rates = np.shape(rates)[1]\n",
    "            \n",
    "            key = port_idx[j]\n",
    "            \n",
    "            # Avoid simulated rates/FX breaching boundaries of training set\n",
    "            if i > 0:\n",
    "                dim = n_rates + 1\n",
    "                test_rates = np.zeros([M, dim])  \n",
    "                test_rates[:, 0] = prev_rates[idx]\n",
    "                test_rates[:, 1:] = rates[idx, :] \n",
    "            else:\n",
    "                dim = n_rates  \n",
    "                test_rates = np.zeros([M, dim])  \n",
    "                test_rates = rates[idx, :] \n",
    "                \n",
    "            # Rescale x and y to the unit interval\n",
    "            for k in range(dim):\n",
    "                idx_min = np.where(test_rates[:, k] < portfolio[key]['min'][i][k])\n",
    "                idx_max = np.where(test_rates[:, k] > portfolio[key]['max'][i][k])\n",
    "                if len(idx_min[0]) > 0:\n",
    "                    test_rates[idx_min, k] = portfolio[key]['min'][i][k]\n",
    "                if len(idx_max[0]) > 0:\n",
    "                    test_rates[idx_max, k] = portfolio[key]['max'][i][k]\n",
    "            min_ = portfolio[port_idx[j]]['min_sc_x'][i]\n",
    "            max_ = portfolio[port_idx[j]]['max_sc_x'][i]\n",
    "            \n",
    "            test_rates_sc = (test_rates - min_) / (max_ - min_)\n",
    "            \n",
    "            pred_sc, std = portfolio[key]['GPs'][i].predict(test_rates_sc, return_std=True) \n",
    "            min_y_ = portfolio[port_idx[j]]['min_sc_y'][i]\n",
    "            max_y_ = portfolio[port_idx[j]]['max_sc_y'][i]\n",
    "            \n",
    "            pred = min_y_ + pred_sc * (max_y_ - min_y_)\n",
    "            std *= (max_y_ - min_y_)\n",
    "            pred_ += portfolio[key]['weight'] * pred\n",
    "            var_ += (portfolio[key]['weight'] * std)**2 \n",
    "            obs_ += portfolio[key]['weight'] * irs_prices\n",
    "\n",
    "        pi['tilde'][i, :] = pred_\n",
    "        pi['exact'][i, :] = obs_\n",
    "        pi['tilde_var'][i, :] = var_ \n",
    "        \n",
    "        # compute default probabilities  \n",
    "        dPD[i, :]= haz_rate * np.exp(-haz_rate*dt)#gamma[i-1,m]*exp_factor\n",
    "\n",
    "    CVA = {}\n",
    "    CVA['tilde'] = 0\n",
    "    CVA['exact'] = 0\n",
    "    CVA['exact_up'] = 0\n",
    "    CVA['exact_down'] = 0\n",
    "    CVA['tilde_up'] = 0\n",
    "    CVA['tilde_down'] = 0\n",
    "      \n",
    "    for i in range(nt): \n",
    "        time = i * dt\n",
    "        mu_tilde = np.mean(dPD[i, :] * pi['tilde'][i, :]) * np.exp(-r*(time-t0)) * dt\n",
    "        CVA['tilde'] += mu_tilde\n",
    "        \n",
    "        std_tilde_err = np.std(dPD[i, :] * pi['tilde'][i, :]) * np.exp(-r*(time-t0)) * dt/sqrt(M)\n",
    "        CVA['tilde_up'] += mu_tilde + 2*std_tilde_err\n",
    "        CVA['tilde_down'] += mu_tilde - 2*std_tilde_err\n",
    "        \n",
    "        mu_exact = np.mean(dPD[i, :] * pi['exact'][i, :]) * np.exp(-r*(time-t0))*dt\n",
    "        CVA['exact'] += mu_exact\n",
    "        \n",
    "        std_exact_err = np.std(dPD[i, :] * pi['exact'][i, :]) * np.exp(-r*(time-t0)) * dt/sqrt(M)\n",
    "        CVA['exact_up'] += mu_exact + 2*std_exact_err\n",
    "        CVA['exact_down'] += mu_exact - 2*std_exact_err\n",
    "        \n",
    "    CVA['tilde'] *= (1-def_model['recovery'])\n",
    "    CVA['tilde_up'] *= (1-def_model['recovery'])\n",
    "    CVA['tilde_down'] *= (1-def_model['recovery'])\n",
    "    CVA['exact'] *= (1-def_model['recovery'])\n",
    "    CVA['exact_up'] *= (1-def_model['recovery'])\n",
    "    CVA['exact_down'] *= (1-def_model['recovery'])\n",
    "        \n",
    "    return CVA, pi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def_model = {}\n",
    "def_model['recovery'] = 0.4 # recovery rate\n",
    "\n",
    "sim_params = {}\n",
    "sim_params['M'] = 1000   # Number of simulations  \n",
    "sim_params['nt'] = 10     # Number of exposure dates\n",
    "    \n",
    "model_params= {}   \n",
    "model_params['r'] = 0.02\n",
    "model_params['T'] = 10.0 # Longest dated cash flow maturity in the portfolio \n",
    "model_params['t0'] = 0\n",
    "model_params['lambda'] = 0.05 # constant hazard rate\n",
    "model_params['size'] = 20 # number of instruments in portfolio\n",
    "\n",
    "CVA_0, pi_0 = CVA_simulation(sim_params, model_params, def_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "CVA_0_err = 100 * (CVA_0['exact'] - CVA_0['tilde']) / CVA_0['exact']\n",
    "print('% error in CVA_0:', CVA_0_err)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare the exact CVA, i.e. using BS prices, with the GP portfolio and provide the GP error bounds "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CVA_0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "epe_gp = np.mean(pi_0['tilde'], axis=1)\n",
    "epe_obs = np.mean(pi_0['exact'], axis=1)\n",
    "epe_var = np.mean(pi_0['tilde_var'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Error plot in expected positive exposure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize = (10, 6), facecolor='white', edgecolor='black')\n",
    "plt.plot(0.1 + np.arange(10), epe_gp, color = 'red', label = 'GP Prediction')\n",
    "plt.plot(0.1 + np.arange(10), epe_obs, color = 'black', label = 'Analytical Model')\n",
    "plt.fill_between(0.1 + np.arange(10), (epe_gp - 2.0 * np.sqrt(epe_var)), (epe_gp + 2.0 * np.sqrt(epe_var)), color = 'grey', alpha=0.3)\n",
    "plt.legend(loc = 'best', prop={'size':10})\n",
    "plt.xlabel('time (years)')\n",
    "plt.ylabel('EPE (Euro)');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
